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Abstract — We present a novel technique for encoding and 
decoding constant weiglit binary codes that uses a geometric 
interpretation of the codebook. Our technique is based on 
embedding the codebook in a Euclidean space of dimension equal 
to the weight of the code. The encoder and decoder mappings are 
then interpreted as a bijection between a certain hyper-rectangle 
and a polytope in this Euclidean space. An inductive dissection 
algorithm is developed for constructing such a bijection. We prove 
that the algorithm is correct and then analyze its complexity. 
The complexity depends on the weight of the code, rather than 
on the block length as in other algorithms. This approach is 
advantageous when the weight is smaller than the square root 
of the block length. 

Index Terms — Constant weight codes, encoding algorithms, 
dissections, polyhedral dissections, bijections, mappings, Dehn 
invariant. 



I. Introduction 

We consider the problem of encoding and decoding binary 
codes of constant Hamming weight w and block length n. 
Such codes are useful in a variety of applications: a few 
examples are fault-tolerant circuit design and computing [15], 
pattern generation for circuit testing [24], identification cod- 
ing [26] and optical overlay networks [25]. 

The problem of interest is that of designing the encoder and 
decoder, i.e., the problem of mapping all binary (information) 
vectors of a given length onto a subset of length-n vectors 
of constant Hamming weight w in a one-to-one manner In 
this work, we propose a novel geometric method in which 
information and code vectors are represented by vectors in w- 
dimensional Euclidean space, covering polytopes for the two 
sets are identified, and a one-to-one mapping is established by 
dissecting the covering polytopes in a specific manner. This 
approach results in an invertible integer-to-integer mapping, 
thereby ensuring unique decodability. The proposed algorithm 
has a natural recursive structure, and an inductive proof is 
given for unique decodability. The issue of efficient encoding 
and decoding is also addressed. We show that the proposed 
algorithm has complexity O(w^), where w is the weight of 
the codeword, independent of the codeword length. 

Dissections are of considerable interest in geometry, partly 
as a source of puzzles, but more importantly because they are 
intrinsic to the notion of volume. Of the 23 problems posed by 
David Hilbert at the International Congress of Mathematicians 
in 1900, the third problem dealt with dissections. Hilbert asked 
for a proof that there are two tetrahedra of the same volume 
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with the property that it is impossible to dissect one into a 
finite number of pieces that can be rearranged to give the 
other, i.e., that the two tetrahedra are not equidecomposable. 
The problem was immediately solved by Dehn [7]. In 1965, 
after 20 years of effort, Sydler [23] completed Dehn's work. 
The Dehn-Sydler theorem states that a necessary and sufficient 
condition for two polyhedra to be equidecomposable is that 
they have the same volume and the same Dehn invariant. This 
invariant is a certain function of the edge-lengths and dihedral 
angles of the polyhedron. An analogous theorem holds in 
four dimensions (Jessen [11]), but in higher dimensions it is 
known only that equality of the Dehn invariants is a necessary 
condition. In two dimensions any two polygons of equal area 
are equidecomposable, a result due to Bolyai and Gerwein (see 
Boltianskii [1]). Among other books dealing with the classical 
dissection problem in two and three dimensions we mention 
in particular Frederickson [8], Lindgren [13] and Sah [19]. 

The remainder of the paper is organized as follows. We 
provide background and review relevant previous work in Sec- 
tion [11] Section III describes our geometric approach and gives 
some low-dimensional examples. Encoding and decoding al- 
gorithms are then given in Section [TV] and the correctness of 
the algorithms is established. Sec tionjv] summarizes the paper. 



II. Background and Previous Methods 

Let us denote the Hamming weight of a length-7i binary 
sequence s := (si, S2, . . . , s„) by w{s) := \{si : Si = 1}|, 
where | • | is the cardinality of a set. 

Definition 1: An (n,w) constant weight binary code C is 
a set of length-ri sequences such that any sequence s G C has 
weight w{s) = w. 

If C is an {n,w) constant weight code, then its rate R := 
(l/n)log2 \C\ < R{n,w) {l/n)\og^ For fixed /3 and 
w — Y(3n\, we have 



R := lim R{n, w) = h{/3) , 



(1) 



where h{(3) - ~f3log^{(3) - (1 - /3)log2(l ~ /?) is the 
entropy function. Thus R is maximized when f3 = 1/2, i.e., 
the asymptotic rate is highest when the code is balanced. 

The (asymptotic) efficiency of a code relative to an infinite- 
length code with the same weight to length ratio w/n, given by 
T] :— R/R, can be written as 77 = rjifj where rji :— R/R{n, w) 
and fj := R{n,w) / R. The first term, 771, is the efficiency of 
a particular code relative to the best possible code with the 
same length and weight; the second term, rj, is the efficiency 
of the best finite-length code relative to the best infinite-length 
code. 
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Fig. 1. Efficiency r; as a function of block length when /3 = 1/2 

From Stirling's formula we have 

_^ log,{2nnl3{l - /3)) 
'^"^ 2^) • 

A plot of 7/ as a function of n is given in Fig. [l]for f3 — 1/2. 
The slow convergence visible here is the reason one needs 
codes with large block lengths. 

Comprehensive tables and construction techniques for bi- 
nary constant weight codes can be found in [2] and the 
references therein. However, the problem of finding efficient 
encoding and decoding algorithms has received considerably 
less attention. We briefly discuss two previous methods that 
are relevant to our work. The first, a general purpose technique 
based on the idea of lexicographic ordering and enumeration 
of codewords in a codebook (SchaUcwijk [20], Cover [3]) is an 
example of ranking/unranking algorithms that are well studied 
in the combinatorial literature (Nijenhuis and Wilf [14]). 
We refer to this as the enumerative approach. The second 
(Knuth [12]) is a special-purpose, highly efficient technique 
that works for balanced codes, i.e., when w = [(n/2)J, and 
is referred to as the complementation method. 

The enumerative approach orders the codewords lexico- 
graphically (with respect to the partial order defined by < 1), 
as in a dictionary. The encoder computes the codeword from 
its dictionary index, and the decoder computes the dictionary 
index from the codeword. The method is effective because 
there is a simple formula involving binomial coefficients 
for computing the lexicographic index of a codeword. The 
resulting code is fully efficient in the sense that 771 = 1. 
However, this method requires the computation of the exact 
values of binomial coefficients (^) , and requires registers of 
length 0(n), which limits its usefulness. 

An alternative is to use arithmetic coding (Rissanen and 
Langdon [18], Rissanen [17]; see also Cover and Thomas [4, 
§13.3]). Arithmetic coding is an efficient variable length source 
coding technique for finite alphabet sources. Given a source 
alphabet and a simple probability model for sequences x, 
let p{x) and F{x) denote the probability distribution and 
cumulative distribution function, respectively. An arithmetic 
encoder represents a; by a number in the interval {F{x) — 
p{x),F{x)]. The implementation of such a coder can also 



run into problems with very long registers, but elegant finite- 
length implementations are known and are widely used (Wit- 
ten, Neal and Cleary [28]). For constant weight codes, the 
idea is to reverse the roles of encoder and decoder, i.e., to 
use an arithmetic decoder as an encoder and an arithmetic 
encoder as a constant weight decoder (Ramabadran [16]). 
Ramabadran gives an efficient algorithm based on an adaptive 
probability model, in the sense that the probability that the 
incoming bit is a 1 depends on the number of I's that have 
already occurred. This approach successfully overcomes the 
finite-register-length constraints associated with computing the 
binomial coefficients and the resulting efficiency is often very 
high, in many cases the loss of information being at most one 
bit. The encoding complexity of the method is 0{n). 

Knuth's complementation method [12] relies on the key 
observation that if the bits of a length-n binary sequence are 
complemented sequentially, starting from the beginning, there 
must be a point at which the weight is equal to [n/2j . Given 
the transformed sequence, it is possible to recover the original 
sequence by specifying how many bits were complemented 
(or the weight of the original sequence). This information is 
provided by a (relatively short) constant weight check string, 
and the resulting code consists of the transformed sequence 
followed by the constant weight check bits. In a series of 
papers, Bose and colleagues extended Knuth's method in 
various ways, and determined the limits of this approach 
(see [29] and references therein). The method is simple and 
efficient, and even though the overall complexity is 0{n), for 
n = 100 we found it to be eight times as fast as the method 
based on arithmetic codes. However, the method only works 
for balanced codes, which restricts its applicability. 

The two techniques that we have described above both have 
complexity that depends on the length n of the codewords. In 
contrast, the complexity of our algorithm depends only on 
the weight w, which makes it more suitable for codes with 
relatively low weight. 

As a final piece of background information, we define what 
we mean by a dissection. We assume the reader is familiar with 
the terminology of polytopes (see for example Coxeter [5], 
Griinbaum [9], Ziegler [30]). Two polytopes P and Q in 
M"' are said to be congruent if Q can be obtained from P 
by a translation, a rotation and possibly a reflection in a 
hyperplane. Two polytopes P and Q in are said to be 
equidecomposable if they can be decomposed into finite sets of 
polytopes Pi, . . . ,Pt and Qi, ■ . . ,Qt , respectively, for some 
positive integer t, such that Pi and Qi are congruent for all 
i = 1,. . . ,t (see Frederickson [8]). That is, P is the disjoint 
union of the polytopes Pi, and similarly for Q. If this is the 
case then we say that P can be dissected to give Q (and that 
Q can be dissected to give P). 

Note that we allow reflections in the dissection: there are 
at least four reasons for doing so. (i) It makes no difference 
to the existence of the dissection, since if two polytopes are 
equidecomposable using reflections they are also equidecom- 
posable without using reflections. This is a classical theorem 
in two and three dimensions [8, Chap. 20] and the proof is 
easily generalized to higher dimensions, (ii) When studying 
congruences, it is simpler not to have to worry about whether 
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the determinant of the orthogonal matrix has determinant +1 
or —1. (iii) Allowing reflections often reduces the number of 
pieces, (iv) Since our dissections are mostly in dimensions 
greater than three, the question of "physical realizability" is 
usually irrelevant. Note also that we do not require that the Pi 
can be obtained from P by a succession of cuts along infinite 
hyperplanes. All we require is that P be a disjoint union of 
the Fj. 

One final technical point: when defining dissections using 
coordinates, as in Eqns. ([3]), Q below, we use a mixture of 
< and < signs in order to have unambiguously defined maps. 
This is essential for our application. On the other hand, it 
means that the "pieces" in the dissection may be missing 
certain boundaries. It should therefore be understood that if 
we were focusing on the dissections themselves, we would 
replace each piece by its topological closure. 

For further information about dissections see the books 
mentioned in Section |l] 

III. The Geometric Interpretation 

In this section, we first consider the problem of encoding 
and decoding a binary constant weight code of weight w = 2 
and arbitrary length n, i.e., where there are only two bits 
set to 1 in any codeword. Our approach is based on the 
fact that vectors of weight two can be represented as points 
in two-dimensional Euclidean space, and can be scaled, or 
normalized, to lie in a right triangle. This approach is then 
extended, first to weight w = 3, and then to arbitrary weights 
w. 

For any weight w and block length n, let denote the 
set of all weight w vectors, with \Cw\ = (^)- Our codebook 
C will be a subset of C^, and will be equal to Cw for a 
fully efficient code, i.e., when rji — 1. We will represent a 
codeword by the w-tuple y' := {y^^y'^, . . . ,y'J, 1 < y[ < 
y'2 < ... < y'w ^ where y^ is the position of the ith 
1 in the codeword, counting from the left. If we normalize 
these indices y,- by dividing them by n, the codebook C 
becomes a discrete subset of the polytope T^,, the convex hull 
of the points 0"', 0"'-il, O^-^n^ _ . , ^ 01"'-\ I'". T2 is a right 
triangle, T3 is a right tetrahedron and in general we will call 
Tu, a unit orthoschem^ 

The set of inputs to the encoder will be denoted by 71^,: 
we assume that this consists of w-tuples y := {yi,y2, ■ ■ ■ , yw) 
which range over a w-dimensional hyper-rectangle or "brick". 
After normalization by dividing the by n, we may assume 
that the input vector is a point in the hyper-rectangle or "brick" 



[0, 1) X [1 - 1/2, 1) X ... X [1 - 1) 



We will use x :— (xi, 2:2, ■ • • , a^^u) ~ u/n G and 
x' := {x'i,X2, ■ ■ ■ , x'^) = y' /n G to denote the normal- 
ized versions of the input vector and codeword, respectively, 
defined by Xi :— yi/n and x[ :— y[/n for i = I,. . . ,w. 

The basic idea underlying our approach is to find a dis- 
section of Bw that gives Tyj. The encoding and decoding 

'An orthoscheme is a «)-dimensionaI simplex having an edge path consist- 
ing of w totally orthogonal vectors (Coxeter [5]). In a unit orthoscheme these 
edges all have length 1. 





Fig. 2. Two ways to dissect rectangle B2 to give triangle T2. Piece 1 may 
be rotated about center into its new position, or reflected in main diagonal 
and translated downwards. 



algorithms are obtained by tracking how the points y and y' 
move during the dissection. 

The volume of B^ is Ix^x |x---x — = This is also 
the volume of T^, as the following argument shows. Classify 
the points x = (xi, . . . , x^) in the unit cube [0, 1]™ into w\ 
regions according to their order when sorted; the regions are 
congruent, so all have volume \/w\, and the region where the 
Xi are in nondecreasing order is T^^. 

We now return to the case w = 2. There are many ways 
to dissect the rectangle B2 into the right triangle T2. We 
will consider two such dissections, both two-piece dissections 
based on Fig. [2] 

In the first dissection, the triangular piece marked 1 in Fig.|2] 
is rotated clockwise about the center of the square until it 
reaches the position shown on the right in Fig.|2] In the second 
dissection, the piece marked 1 is first reflected in the main 
diagonal of the square and then translated downwards until 
it reaches the position shown on the right in Fig. |2] In both 
dissections the piece marked 2 is fixed. 

The two dissections can be specified in terms of coordi- 
nate^ as follows. For the first dissection, we set 



{x\,x'2) -.^ {xi,X2) \fXi<X2 
{x'i,x'2) :— {I — Xi,l — X2) ifXi>X2 



and for the second, we set 

— {xi,X2) 

(x'j^jXj) {X2 — 2J^l 



if Xi < X2 

if Xi > X2 



(3) 



(4) 



The first dissection involves only a rotation, but seems 
harder to generalize to higher dimensions. The second one 
is the one we will generalize; it uses a reflection, but as 
mentioned at the end of Section |ll] this is permitted by the 
definition of a dissection. 

We next illustrate how these dissections can be converted 
into encoding algorithms for constant weight (weight 2) binary 
codes. Again there may be several solutions, and the best 
algorithm may depend on arithmetic properties of n (such as 
its parity). We work now with the unnormalized sets Tl2 and 
C2. In each case the output is a weight-2 binary vector with 
I's in positions y[ and 1/2 ■ 

-For our use of a mixture of < and < signs, see the remark at the end of 
Section |ll] 



A. First Dissection, Algorithm 1 

1) The input is an information vector (?/i,?/2) G TZ2 with 
l<yi<n-\ wd \n/2\ + I < y2 < n. 

2) If yi < y2, we set y[ — yi, y'2 — y2, otherwise we set 
y[^n- yi and 2/2 = - 2/2 + 1- 

For n even, this algorithm generates all possible n{n ~ l)/2 
codewords. For n odd it generates only (n — 1)^/2 codewords, 
leading to a slight inefficiency, and the following algorithm is 
to be preferred. 

B. First Dissection, Algorithm 2 

1) The input is an information vector (2/1,2/2) G ^2 with 
l<yi<n, \{n + l)/2] + 1 < 2/2 < n. 

2) If 2/1 < y2, we set 2/1 — yi, y'2 — y2, otherwise we set 

y'l ^ n - yi + I, y'2 ^ n - y2 + 2. 

For n odd, this algorithm generates all n{n~ l)/2 codewords, 
but for n even it generates only n{n— l)/2 codewords, again 
leading to a slight inefficiency. 

C. Second Dissection 

1) The input is an information vector (2/1,2/2) G ^2 with 
1 < 2/1 < - 1 and [n/2] + 1 < 2/2 < 

2) If 2/1 < y2, we set 2/1 = yi, y'2 ^ y2, otherwise we set 
2/i = 2/2-K2l,2/2 = yi-r^/2l + l. 

For n even, this algorithm generates all n{n-~l) /2 codewords, 
but for n odd it generates only (n— 1)^/2 codewords, leading 
to a slight inefficiency. There is a similar algorithm, not given 
here, which is better when n is odd. 

Note that only one test is required in any of the encoding 
algorithms. The mappings are invertible, with obvious decod- 
ing algorithms corresponding to the inverse mappings from C2 
to 7^2 

We now extend this method to weight w — 3. Fortu- 
nately, the Dehn invariants for both the brick B3 and our 
unit orthoscheme T^, which is the tetrahedrorj^ with vertices 
(0,0,0), (0,0, 1),(0, 1, 1) and (1, 1, 1), are zero (since in both 
cases all dihedral angles are rational multiples of tt), and so 
by the Dehn-Sydler theorem the polyhedra and T3 are 
equidecomposable. As already mentioned in Section |l] the 
Dehn-Sydler theorem applies only in three dimensions. But 
it will follow from the algorithm given in the next section that 
and are equidecomposable in all dimensions. 

We will continue to describe the encoding step (the map 
from Biij to T^,) first. We will give an inductive dissection 
(see Fig. |3]l, transforming B^ to in two steps, effectively 
reducing the dimension by one at each step. In the first step, 
the brick B3 is dissected into a triangular prism (the product 
of a right triangle, T2, and an interval), and in the second step 
this triangular prism is dissected into the tetrahedron T3. Note 
that the first step has essentially been solved by the dissection 
given in Eqn. (j4|. 

For the second step we use a four-piece dissection of the 
triangular prism to the tetrahedron T3. This dissection, shown 

'To solve Hilbert's third problem, Dehn showed that this tetrahedron is not 
equidecomposable with a regular tetrahedron of the same volume. 
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Fig. 3. Transformation from tetrahedron to rectangular prism. 




Fig. 4. Four-piece dissection of tetrahedron to triangular prism. Pieces 2 and 
3 are reflected. 

with the tetrahedron and prism superimposed in Fig.|4] appears 
to be new. 

There is a well-known dissection of the same pair of 
polyhedra that was first published by Hill in 1896 [10]. This 
also uses four pieces, and is discussed in several references: 
see Boltianskii [1, p. 99], Cromwell [6, p. 47], Frederickson [8, 
Fig. 20.4], Sydler [22], Wells [27, p. 251]. However, Hill's 
dissection seems harder to generalize to higher dimensions. 
Hill's dissection does have the advantage over ours that it can 
be accomplished purely by translations and rotations, whereas 
in our dissection two of the pieces (pieces labeled 2 and 3 in 
Fig.|4| are also reflected. However, as mentioned at the end of 
Section [U] this is permitted by the definition of a dissection, 
and is not a drawback for our application. |^ Apart from this, 
our dissection is simpler than Hill's, in the sense that his 
dissection requires a cut along a skew plane (xi — X3 — 1/3), 
whereas all our cuts are parallel to coordinate axes. 

To obtain the four pieces shown in Fig. |4] we first make two 
horizontal cuts along the planes X3 = | and X3 = |, dividing 
the tetrahedron into three slices. We then cut the middle slice 
into two by a vertical cut along the plane X2 = 

There appears to be a tradition in geometry books that 
discuss dissections of not giving coordinates for the pieces. 
To an engineer this seems unsatisfactory, and so in Table [l] 
we list the vertices of the four pieces in our dissection. Piece 
1 has four vertices, while the other three pieces each have 
six vertices. (In the Hill dissection the numbers of vertices of 
the four pieces are 4, 5, 6 and 6 respectively.) Given these 
coordinates, it is not difficult to verify that the four pieces can 

^This dissection would also work if piece 2 was merely translated and 
rotated, not reflected, but the reflection is required by our general algorithm. 
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be reassembled to form the triangular prism, as indicated in 
Fig. [4] As already remarked, pieces 2 and 3 are also reflected 
(or "turned over" in a fourth dimension). The correctness of 
the dissection also follows from the alternative description of 
this dissection given below. 



Piece 


Coordinates 


1 


[0, 0, 0], [0, 0, 1/3], [0, 1/3, 1/3], [1/3, 1/3, 1/3]. 


2 


[0,0, 1/3], [0,1/3, 1/3], [1/3, 1/3, 1/3], 




[0, 0, 2/3], [0, 1/3, 2/3], [1/3, 1/3, 2/3]. 


3 


[0, 1/3, 1/3], [1/3, 1/3, 1/3], [0, 1/3, 2/3], 




[0, 2/3, 2/3], [2/3, 2/3, 2/3], [1/3, 1/3, 2/3]. 


4 


[0, 0, 2/3], [0, 2/3, 2/3], [2/3, 2/3, 2/3], 




[0,0, 1], [0,1,1], [1,1,1]. 


TABLE I 



Coordinates of vertices of pieces in dissection of tetrahedron 

SHOWN IN FlG.|4] 

The dissection shown in Fig. |4] can be described alge- 
braically as follows. We describe it in the more logical 
direction, going from the triangular prism to the tetrahedron 
since this is what we will generalize to higher dimensions 
in the next section. The input is a vector (xi, 2:2, xa) with 
< 2:1 < a;2 < 1, I < 2:3 < 1; the output is a vector 
(x']^, Xj, xjj) with < a;^ < < X3 < 1, given by 



{xi,X2,X3) if xi < a;2 < 2:3 



{Xi - 


3 ' ^3 


1 
3 


X2 - 5) 


if 


i < Xl < X3 < 


X2 


{X3 - 


3 , X2 — 


2 
3 


Xl + |) 


if 


Xl < i < X3 < 


X2 


. {X3 - 


^ Xi - 


2 
3 


X2 - I) 


if 


X3 < Xl < X2 





(5) 



The four cases in Eqn. (j5]l, after being transformed, corre- 
spond to the pieces labeled 4, 3, 2, 1 respectively in Fig.|4] We 
see from Eqn. (|5]l that in the second and third cases the linear 
transformation has determinant —1, indicating that these two 
pieces must be reflected. 

Since it is hard to visualize dissections in dimensions greater 
than three, we give a schematic representation of the above 
dissection that avoids drawing polyhedra. Fig. [5] shows a 
representation of the transformation from the triangular prism 
to the tetrahedron T3, equivalent to that given in Eqn. (|5]l. The 
steps shown in Fig. |5] may be referred to as "cut and paste" 
operations, because, as Fig. |5] shows, the vector in the trian- 
gular prism is literally cut up into pieces which are rearranged 
and relabeled. Note that, to complete the transformation, we 
precede this operation by the dissection given in Eqn. Q, 
finally establishing the bijection between B3 and T3. 

We now describe the mapping shown in Fig. |5] in more 
detail. The triangular prism is represented by the set of 
partially ordered triples (xi,X2,X3) with < xi < X2 < 1 
and I < X3 < 1, and we wish to transform this into the 
tetrahedron consisting of the points (x'l , X2 , x J5) with <x[< 

We divide the interval [0, 1) into w ~ 3 equal segments 
of length 1/w = 1/3, and consider where the points xi,X2 



and X3 fall in this interval, given that (xi,X2,X3) is in the 
triangular prism. There are three possibilities for where X3 lies 
in relation to < xi < X2 < 1, and we further divide the case 
Xl < X3 < X2 into two subcases depending on whether xi > 
i or Xl < i. These are the four cases shown in Fig. [s] and 
correspond one-to-one with the four dissection pieces in Fig.|4] 
Fig. |5] shows how the triples xi,X2,X3 (reindexed according 
to their relative positions) are mapped to the triples x'l, Xj, Xg. 

The last column of Fig. |5] shows the ranges of the x^ in the 
four cases; the fact that these ranges are disjoint guarantees 
that the mapping from xi,X2,X3 to Xi,X2,X3 is invertible. 
The ranges of the x'^ will be discussed in more detail in the 
following section after the general algorithms are presented. 

This operation can now be described without explicitly 
mentioning the underlying dissection. Each interval of length 
1/w, together with the given x^ values within it, is treated 
as a single complete unit. In the "cut and paste" operations, 
these units are rearranged and relabeled in such a way that the 
operation is invertible. 

IV. Algorithms and Proof of Correctness 

In the previous section we provided an encoding and de- 
coding algorithm for weights w = 2 and w = 3, based on our 
geometric interpretation of C2 and C3 as points in M.^. In this 
section, the algorithm is generalized to larger values of the 
weight w. We start with the geometry, and give a dissection 
of the "brick" B.^ into the orthoscheme T^,. We work with 
the normalized coordinates x^ = Vi/n (for a point in B^,) and 
x^ = yyn (for a point in r„,), where 1 < i < u>. Later in 
this section, we discuss the modifications needed to take into 
account the fact that the y[ must be integers. 

A. An Inductive Decomposition of the Orthoscheme 

Restating the problem, we wish to find a bijection be- 
tween the sets i?^, and T^^. The inductive approach developed 
for w = 3 (where the w = 2 case was a subproblem) will be 
generalized. Of course the bijection Fi between Bi and Ti 
is trivial. We assume that a bijection F^_i is known between 
-Btu-i and T^-i, and show how to construct a bijection F^^ 
between B^j and T^. 

The last step in the induction uses a map from the prism 
Tw-i X [1 — 1) to Tu, (/2 is the map described in Eqn. (jiji 
and /3 is described in Eqn. (jSj). The mapping F^j from iJ^; 
to is then given recursively by F^^ : (xi, X2, . . . , x^;) 1-^ 
(x'i,x^, . . .,x'J, where 

(Xi, X2, . . . , Xyj) :— /u) (f ID — 1 (2^1 , X2, • ■ . , Xm_i), Xun) ■ (6) 

For III = 1 we set 

Fi := fi-.Bi^ Ti, (xi) ^ (x'l) = (xi) . 

By iterating Eqn. (|6]l, we see that F^ is obtained by succes- 
sively applying the maps /i, /2,. . . , /^,. 

The following algorithm defines for w > 2. We begin 
with an algebraic definition of the mapping and its inverse, 
and then discuss it further in the following section. The input 
to the mapping f^, is a vector x :— (xi, X2, . . . , x^,), with 
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Fig. 5. Cut-and-paste description of tlie inverse transformation from triangular prism to tetrahedron. 



(a;i,X2,...,i 
is a vector x 



„_i) £ T^-i and Xw <= [1 — l/w, 1); the output 



in either case, let '■— jo 
3) Set equal to: 



Forward mapping (w > 2); 

1) Let 

io ■■= niin{?: G {1, . . . , w} | x„ < x j . 

2) Let 

jo := min{« e {1, . . . ,io} I w - io + « - 1 < wxj - 1 . 

3) Set x'j. equal to: 



r' 


w 


^fc-io + 


w+jo-io 


w 


^fc-io + l 


1 w+jo-io 


w 


x' ■ + 






w 



mo. 

for k — 1, . 
for fc = jo 
for fc = io7 
for k = w 



- 1 



(8) 



Xk- 

Xk- 
Xk^ 



■30 



w+jo-io 



-jo-i ' 
-w+jo 



w+jo-ip 
w 

io-jo 



for fc = 1, . . . ,io - jo - 1 
for k = io- jo 

for k = io - jo + ■ ■ ■ ,'w - jo 
for fc = w — jo + 1, . . . , w 

Eqn. (|7]i identifies the "cut and paste" operations required to 
obtain xj,, for different ranges of the variable k. If the initial 
index in one of the four cases in Eqn. (j?} is smaller than the 
final index, that case is to be skipped. A case is also skipped 
if the subscript for an Xi is not in the range 1, . . . Note 
in Step 1 that io = w if x^^ is the largest of the x^'s. This 
implies that jo — 0, and then Step 3 is the identity map. 

The inverse mapping Gw from to has a similar 
recursive definition. The wth step in the induction is the map 
9w '■ ^ Tw-i X [1 — ^, 1) defined below. For u; = 1 we 
set 

Gi := gi -.Ti^Bi, {x[) i-^ (xi) = {x[) . 

The map is obtained by successively applying the maps 

9w,9w-i,- 9i- 

Inverse mapping (w > 2): 

1) Let 

mo := max{i S {1, . . . , w} | i — 1 < wx^} . 

2) If mo — w, let jo := 0, otherwise let 

jo := w — max{i e {mo + 1, . . . ,w} \ wx[ < mo} ; 



Note that the transformations in Eqn. (|7]i and Eqn. ([8]l are 
formal inverses of each other, and that these transformations 
are volume-preserving. The underlying linear transformations 
are orthogonal transformations with determinant +1 or —1. 

Before proceeding further, let us verify that in the case w — 
3, the mapping f^^ = agrees with that given in Eqn. (j5]l. 

(7) • If xi < a;2 < 2^3, then io = 3, jo = and the map is the 
identity, as mentioned above. 
• If xi < X3 < X2 there are two subcases: 
o If \ < Xi then io = 2, jo = 0. 



o If xi < 7j then io ~ 2, jo = 1. 
• ^f X3 < xi < X2, then io — 1, jo ~ 0. 
The transformations in Eqn. (j7]i now exactly match those in 
Eqn. Q. 



B. Interpretations and Explanations 

In Fig. [6] we give a graphical interpretation of the algorithm, 
which can be regarded as a generalization of the "cut and 
paste" description given above. This figure shows the trans- 
formation defined by the wth step in the algorithm. At this 
step, we begin with a list of w—1 numbers {xi,X2, ■ ■ ■ , Xto-i) 
in increasing order, and a further number a;^ which may be 
anywhere in the interval [1 — 1/w, 1). This list of w numbers 
is plotted in the plane as the set of w points {i,wxi) for 
i = 1,2, ... ,w (indicated by the solid black circles in Fig. 
[6]l. In the first step in the forward algorithm, the augmented 
list {xi,X2, ■ ■ ■ ,Xnj) is sorted into increasing order. In the 
sorted list, x^ now occupies position io, so the point (w, wx^j) 
moves to the left, to the new position (io, uux^), and the points 
(i, wxi) for i = io + l,...,t« — 1 move to the right. This is 
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/ 
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1,2..., 7o io+l'—'o-l 



(,,,...,^-1 



Fig. 6. A graphical illustration of tlie forward and inverse mapping. 



indicated by the arrows in the figure. The new positions of 
these points are marked by hollow circles. 

The point (iq, wXw) now lies between the grid points (zq, w) 
and (io, w — 1) (it may coincide with the latter point), since 
a;™ > 1 ~ ^. We draw the line y = a; + w — io — 1 (shown as 
the dashed-and-dotted line in Fig. |6]l. This has unit slope and 
passes through the points (iq, w — 1) and (0, w — — 1). The 
algorithm then computes jo + 1 to be the smallest index % for 
which Xi is on or above this line. Once iq and jo have been 
determined, the forward mapping proceeds as follows. The 
points (i, wxi) for i — 1, . . . , jo are shifted to the right of the 
figure and are moved upwards by the amount (io — jo)/w, 
their new positions being indicated by crosses in the figure. 
Finally, the origin is moved to the grid point (jo, w — io + jo) 
and the points are reindexed. The too := *o ^ jo points which 
originally had indices jo + 1, . . . , iq become points 1, . . . , too 
after reindexing. In the new coordinates, the final positions 
of the points lie inside the square region x The 



reader can check that this process is exactly equivalent to the 
algebraic description of given above. 

To recover io and jo, we first determine the value of toq := 
io—jo- This can indeed be done since too is precisely the index 
of the largest wx^ that lies on or above the line y — x — 1 in 
the new coordinate system. Note that the position of this line 
is independent of io and jo and {x'i,X2, . . . , x^). This works 
because the points wxi, . . . , wxjg in the original coordinate 
system, before the origin is shifted, are moved right by w units 
and upwards by w units, so points below the dashed-and-dotted 
line remain below the line. Furthermore, observe that in the 
new coordinate system the number of points {i,wx'^ below 
the line y — too is equal to w — jo. Thus the correct io and 
jo values may be recovered, and the inverse mapping can be 
successfully performed. 

The following remarks record two properties of the algo- 
rithm that will be used later. 

Remark 1: Step 2 of the forward algorithm implies that 
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< ^-io+jo~i ^ > w-ia+jo jj follows that there 

JO W JO^-L — 

is no i in the range 1 < i < w for which 

w w 
Remark 2: The forward algorithm produces a vector x' 
whose components satisfy 

io - jo 



< x'l < • • • < 



< 



< t' < 



W 



- Jo 
W 



w-jo + 



1<< 



and 



Xi. < 



k-1 



for w — jo + l<k<'w. 



(9) 
(10) 

(11) 



Eqns. (|9]l and (lOi follow from the minimizations in Steps 1 
and 2 of the forward algorithm, respectively. The right-hand 



side of Eqn. (Ill expresses the fact, already mentioned, that 
the first jo points remain below the dotted-and-dashed line 
after they are shifted. 

C. Proof of Correctness 

We now give the formal proof that the algorithm is correct. 
This is simply a matter of collecting together facts that we 
have already observed. 

Theorem 1: For any w > \, the forward mapping is 
a one-to-one mapping from T^-i x [1— to with 

inverse g^. 

Proof: First, it follows from Remark 2 that, for x E 
Tio-i x[l- ^, 1), x' = {x[,X2, . . . ,<,) satisfies < x[ < 
x'2 < ■ ■ ■ < x'^ < 1, and so is an element of T„,. 

Suppose there were two different choices for x, say a;*^^) 
and x'-^\ such that 

We know that x' determines mo, jo and ig. So a;*^^' and a;'^) 
have the same associated values of iq and jo- But for a given 
pair («Ojjo)^ Eqn. ^ is invertible. Hence x^^^ = x^'^\ and 
is one-to-one. 

Note that the transformations in Eqn. (j7]i and Eqn. ([8]l are 
inverses of each other. Hence is also an onto map, and 
is its inverse. ■ 

D. Number of Pieces 

The map f^, which dissects the prism T^-\ x [1—^,1) to 
give the orthoscheme r„,, has one piece for each pair («oi Jo)- 
If io = w then jo = 0, while if 1 < io < w — 1, jo takes all 
values from to zo — 1. (It is easy to write down an explicit 
point in the interior of the piece corresponding to a specified 
pair of values of io and jo- Assume io < w and set S = 
Take the point with coordinates {xi, . . . ,x^) given by x^ = 
{w — l)/w + S; Xi ~ Xu] + S{i — io) for i = io + 1, ■ ■ ■ ,w — 1; 
Xi ^ (i + w — io — I — S) / w for i = 1 , . . . , jo; Xi — {i + w — 
io — 1 + S)/w for i = jo + 1, . . . , io — 1.) The total number 
of pieces in the dissection is therefore 

l + l + 2 + 3 + --- + (^.-l)= "'-" + ^ 



which is 1, 2, 4, 7, 11, . . . for w — 1, 2, 3, 4, 5, . . .. This is a 
well-known sequence, entry A 124 in [21], which by coinci- 
dence also arises in a different dissection problem; it is the 
maximal number of pieces into which a circular disk can be 
cut with w — 1 straight cuts. For example, with three cuts, a 
pizza can be cut into a maximum of seven pieces, and this is 
also the number of pieces in the dissection defined by /4. 

E. The Algorithms for Positive Integers 

To apply the above algorithm to the problem of encoding 
and decoding constant weight codes, we must work with 
positive integers rather than real numbers, which entails a 
certain loss in rate, although the algorithms remain largely 
unchanged. Let N := {1,2,3,.. .}, and let n and w be given 
with 2w < 71. In a manner analogous to the real- valued case, 
we find a bijection between a finite hyper-rectangle or brick 
B^^ C N"" and a subset of the finite orthoscheme C 
N"", where is the set of vectors (yi, 1/2, • - - , y«>) e N'" 
satisfying 

n — [w — i) — I J + I < Ui < n — (w — i) , 

I 

for i ~ l,2,...,w, and is the set of vectors 

{yi,y2, - ■ ■ ,yw) <^ N'" satisfying 



1 < 2/1 < 2/2 < 



■■ <yw <n. 



Note that usually \B^\ < \T^\, which entails a loss in rate. 

The forward mapping is now replaced by the map 
which sends (yi,?/2, ■■■,yw) with (?/i,y2,- - • ,y^.-i) € T^„i 
and n — +1 < yw < n to an element of T^. Let us write 
n — pw + q, where p > and < q < w — 1. We partition 
the range 1, 2, . . . , n into w parts, where the first n — w — 1 
parts each have p elements, the next q parts each have p + 1 
elements, and the last part has p elements (giving a total of n 
elements). This is similar to the real-valued case, where each 
interval had length 1 /w. 

1) Let 



io := min{i G {1, . . . , w} | y„ < . 



2) Let 

jo 



niin{i G {1, . . . , io} | < yi} 



where Vi := {w — io + i — l)p + niaxjg — io + i, 0}. 
3) Set yj,. equal to: 



for fc = 1, . . . ,io - jo - 1 

for k = io- jo 



Vk+jo-i + 1 - V-ja+i for fc = io - jo + 1, . . . , w - jo 
Vk-w+jo + n- Fjo+i for fc = w - jo + 1, . . . , w 

The inverse mapping g^ is similarly replaced by the map 

9w-T^w^ {(yi,y2, - - - ,2/to) : (yi,2/2, - - - ,2/u>-i) e T^^x, 
n — [—J + 1 < < n], defined as follows. Again, assume 
n — pw + q. 
1) Let 

mo := max{i e {l,...,w} I Wi < ?/■}, 
where Wi :— q+ {i — l)p + minji — g — 1, 0}. 



(12) 
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2) If mo = w, let jo := 0, otherwise let 

jo ■= w- max{i e {mo + 1, . . . , w} | ?/• < Wmo + p} ; 

in either case, let io := jo + wo- 

3) Set Uk equal to: 



O(w^), independent of the length of the codewords n. It is 
especially suitable for constant weight codes of low weight. 



mo 



Vk+w-jo -P-W, 
y'k-jo+n-p - rv-mo 

y'k-jo+r-'^ + n-p- 
yl.,„+n-p-W„ 



Wrr 



for k 
for k 
for k 
for k 



1,- 

jo 
■ «0, 

: W 



,«0 

- 1 



We omit the proofs, since they are similar to those for the 
real-valued case. 

F. Comments on the Algorithm 

The overall complexity of the transform algorithm is O(w^), 
because at each induction step the complexity is linear in 
the weight at that step. Recall that the complexities of 
the arithmetic coding method and Knuth's complementation 
method are both 0(n). Thus when the weight w is larger 
than ^Jn, the geometric approach is less competitive. When 
the weight is low, the proposed geometric technique is more 
efficient, because Knuth's complementation method is not 
applicable, while the dissection operations of the proposed 
algorithm makes it faster than the arithmetic coding method. 
Furthermore, due to the structure of the algorithm, it is possible 
to paralleUze part of the computation within each induction 
step to further reduce the computation time. 

So far little has been said about mapping a binary sequence 
to an integer sequence yi,y2, ■ ■ ■ ,yw such that t/j € [Li,Ui], 
where Li and Ui are the lower and upper bound of the vaUd 
range as specified by the algorithm. A straightforward method 
is to treat the binary sequence as an integer number and then 
use "quotient and remainder" method to find such a mapping. 
However, this requires a division operation, and when the 
binary sequence is long, the computation is not very efficient. 
A simphfication is to partition the binary sequence into short 
sequences, and map each short binary sequence to a pair of 
integers, as in the case of a weight two constant weight codes. 
Through proper pairing of the ranges, the loss in the rate can 
be minimized. 

The overall rate loss has two components, the first from 
the rounding involved in using natural numbers, the second 
from the loss in the above simplified translation step. However, 
when the weight is on the order of y^, and n is in the range 
of 100 — 1000, the rate loss is usually 1 — 3 bits per block. 
For example, when n = 529, w = 23, then the rate loss is 
2 bits/block compared to the best possible code which would 
encode ko = 132 information bits. 

V. Conclusion 

We propose a novel algorithm for encoding and decoding 
constant weight binary codes, based on dissecting the polytope 
defined by the set of all binary words of length n and weight 
w, and reassembling the pieces to form a hyper-rectangle 
corresponding to the input data. The algorithm has a natural 
recursive structure, which enables us to give an inductive proof 
of its correctness. The proposed algorithm has complexity 
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